Skip to main content

Context Engineering

Context engineering is a system composed of many components

What is Context?

  • Context refers to all information a Large Language Model (LLM) processes before generating a response. It shapes the model’s understanding and output quality, acting as the foundation for effective task performance.

Components of Context

  • System Prompt:
    • Instructions guiding the LLM on its behavior, including how to respond and restrictions to follow.
  • User Prompt:
    • The specific query or instruction provided by the user.
  • Conversation History (Short-Term Memory):
    • The ongoing dialogue between the user and the LLM within a session.
  • Long-Term Memory:
    • Accumulated data from multiple conversations, including user preferences or historical interactions.
  • Retrieval-Augmented Generation (RAG):
    • Up-to-date, external, relevant information fetched to enhance response accuracy.
  • Tool Definitions:
    • Descriptions of tools available to the LLM, specifying their functionality and usage.
  • Output Schema Definition:
    • Specifications for the format of the LLM’s output (e.g., JSON, text).

Importance of Context

  • The quality of an LLM’s response is directly tied to the quality of its context. A rich context enables more accurate and relevant outputs, while poor or absent context leads to suboptimal responses.
  • Example: AI Personal Assistant
    • A context-rich assistant can:
    • Access calendar data to check availability for scheduling.
    • Review past email history to match communication tone with a recipient.
    • Use an email-sending tool to execute actions. In contrast, an assistant with minimal context may misinterpret queries or provide generic, less useful responses.

Context Engineering vs. Prompt Engineering

  • Prompt Engineering:
    • Focuses on crafting a single, precise instruction set within a text prompt.
    • Static approach, emphasizing clarity and specificity in one input.
  • Context Engineering:
    • Involves designing a dynamic system that curates and delivers relevant information to the LLM for task completion.
    • Operates as a preprocessing layer before the LLM call, tailoring context dynamically based on the query.
    • Example: For one request, the system might fetch calendar data; for another, it might perform a web search.

Challenges with Long Contexts

  • While comprehensive context improves responses, excessively long contexts (e.g., millions of tokens, multiple tools, or extensive documents) can degrade performance. Issues include:
    • Distraction: Irrelevant information diverts the LLM’s focus.
    • Confusion: Overloaded context overwhelms the model, reducing clarity.
    • Context Clashes: Contradictory information within the context leads to inconsistent outputs.
  • Source: How Long Contexts Fail | Drew Breunig

Context Management Tactics

  • Effective context management follows the principle of “garbage in, garbage out”: high-quality, relevant context yields accurate responses. Below are key tactics to optimize context, inspired by How to Fix Your Context | Drew Breunig.

  • Retrieval-Augmented Generation (RAG)

    • Description: Selectively fetches relevant, up-to-date external information to enrich the LLM’s context.
    • Benefit: Enhances response accuracy without overloading the model.
    • Example: For a query about recent news, RAG retrieves current articles instead of relying on static knowledge.
  • Tool Loadout Selection

    • Description: Chooses only relevant tool definitions to include in the context, avoiding unnecessary complexity.
    • Approaches:
      • RAG for Tools: Use a retrieval system to match tools to the user’s query (e.g., selecting a calendar tool for scheduling queries).
      • LLM-Powered Recommender: Prompt the LLM to identify which tools are needed for a task.
    • Consideration: The number of tools impacts performance:
      • Some models handle 20–30 tools effectively but struggle with hundreds.
    • Example: For a query about scheduling, include only the calendar tool definition, not irrelevant tools like web search.
  • Context Quarantine

    • Description: Isolates different contexts in separate processing threads, each with specific tools or data.
    • Benefit: Prevents cross-contamination of contexts and reduces the risk of conflicting instructions.
    • Example: Break a complex query into parallel tasks (e.g., one thread handles calendar data, another handles email drafting).
  • Context Pruning

    • Description: Removes irrelevant or redundant information from the context to maintain focus.
    • Benefit: Reduces noise, improving model efficiency and response accuracy.
    • Example: Exclude outdated conversation history unrelated to the current query.
  • Context Summarization

    • Description: Condenses large contexts (e.g., long conversation histories) into concise summaries.
    • Benefit: Maintains essential information while reducing token count and complexity.
    • Example: Summarize a multi-turn conversation into key points for the LLM to reference.
  • Context Offloading

    • The "think" tool: Enabling Claude to stop and think \ Anthropic
    • Description: Stores information externally via tools, retrieving it only when needed, rather than including it in the LLM’s context.
    • Benefit: Reduces context size, especially for data not immediately relevant to the query.
    • Example: Use a “think” tool to store intermediate data, as described in The "think" tool: Enabling Claude to stop and think | Anthropic.
    • Use Case: When the model requires external data (e.g., database results) to formulate a response, offload processing to a tool that manages the data.

Sources